Functional Partial Linear Regression 



Heng Lian 

Division of Mathematical Sciences, 
School of Physical and Mathematical Sciences, 
Nanyang Technological University, 
Singapore 637371. 



Abstract: When predicting scalar responses in the situation where the explana- 
tory variables are functions, it is sometimes the case that some functional vari- 
ables are related to responses linearly while other variables have more complicated 
relationships with the responses. In this paper, we propose a new semi-parametric 
model to take advantage of both parametric and nonparametric functional mod- 
eling. Asymptotic properties of the proposed estimators are established and finite 
sample behavior is investigated through a small simulation experiment. 
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1 Introduction 

Since the introduction of the partial hnear model by [3], it has been widely 
studied in the statistical literature 13, H, [13, 14, llTll. Partial linear models 



belong to the class of semi-parametric models since they contain both paramet- 
ric and nonparametric components. On the one hand, it addresses the curse of 
dimensionality problem associated with completely nonparametric models and 
facilitates interpretation of the effect of the covariates associated with the lin- 
ear part. On the other hand, they are more flexible than the standard linear 
regression when it is believed that some covariates are nonlinearly related to the 
independent variable. 

On another direction of statistical research, there has recently been increased 
interest in the statistical modeling of functional data. In many experiments, 
functional data appear as the basic unit of observations. As a natural extension of 
the multivariate data analysis, functional data analysis provides valuable insights 
into these problems. Compared with the discrete multivariate analysis, functional 
analysis takes into account the smoothness of the high dimensional covariates, 
and often suggests new approaches to the problems that have not been discovered 
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before. Even for nonfunctional data, the functional approach can often offer new 
perspectives on the old problem. 

The literature contains an impressive range of functional analysis tools for 
various problems including exploratory functional principal component analy- 
sis, canonical correlation analysis, classification and regression. Two major ap- 
proaches exist. The more traditional approach, masterfully documented in the 
monograph 15|], typically starts by representing functional data by an expansion 
with respect to a certain basis, and subsequent inferences are carried out on the 
coefficients. The most commonly utilized basis include B-spline basis for non- 
periodic data and Fourier basis for periodic data. Another line of work by the 
French school 0], taking a nonparametric point of view, extends the traditional 
nonparametric techniques, most notably the kernel estimate, to the functional 
case. Some theoretical results are also obtained as a generalization of the con- 
vergence properties of the classical kernel estimate. Some recent advances in the 
area of functional regression include 

In this paper, our aim is to combine the parametric and nonparametric ap- 
proaches to functional regression resulting in functional partial linear models. 
We are aware of two other works that introduced partial linear regression in 
a functional context, the so-called semi-functional partial linear model [2] and 
partial functional linear model The former combines nonparametric func- 

tional model with a standard linear regression component, while the latter used 
a functional linear model together with a standard linear regression model. Both 
models have a functional component as well as a non-functional linear compo- 
nent. To the best of our knowledge, our work is the first study that combines the 
parametric and nonparametric approaches to functional regression in a functional 
semi-parametric model. 

In the next section, we present our new model and construct estimators for 
both the parametric and nonparametric components based on principal compo- 
nent regression and Nadaraya- Watson kernel estimator. Then we derive some 
consistency and convergence rate results for the two components. In Section 3, 
we illustrate our methodology with a simulation study. Finally, in Section U we 
conclude our findings with a discussion. The technical proofs are collected in the 
Appendix. 



2 Funtional partial linear models 

In our functional partial linear regression model, the data triplets {Xi, Ti, Yi}f—i, 
which are independent and identically distributed (i.i.d.), are generated from the 
model ^ 

Y,= [ b{s)Xi{s)ds + g{Ti) + ei. (1) 
Jo 
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Both Xi and Tj are random functions belonging to H = L^([0,1]), the Hilbert 
space containing square integrable functions defined on the unit interval with in- 
ner product {x, y) = x{s)y{s)ds Vx, y G H, b G H is the regression coefficient 
for the linear part and g is a general continuous function on H and the mean 
zero errors ej are independent of the functional covariates {Xj, Tj}. Note that for 
simplicity we assume Tj and Xi are both in while in fact we can assume that 
Ti belongs to a more general vectorial topological space on which a semimetric 
is defined. See 13, 11 1 for more discussions on various possible semimetrics. We 
will use {X,T,Y} to denote the generic random variables with distribution the 
same as {Xi,Ti,Yi} while the corresponding lower-case letters {x,t,y} denote 
nonrandom values that the random variables can assume. To ensure identifiabil- 
ity, we do not put a scalar intercept term in the model since the intercept can be 
incorporated into the nonparametric component. We also assume X is a mean 
zero process. 

To obtain estimators for both components, we get the following equation by 
computing the conditional expectation of ([T|) on T: 

E{Y\T) = {b,E{X\T))+g{T). 

Subtracting the above equation from ([1]) we get the model with only the linear 
component: 

Y - E{Y\T) = {b,X- E{X\T)) + e. (2) 

As E{Y\Ti) and E{X\Ti) are unknown, we replace both expressions by Nadaraya- 
Watson kernel estimators with 



E{Y\T,] 



E{X\Ti, 



E,Km-T,\\/h)Y, 
EjK{\m-T,\\/h) ' 

Z^Km-T,\\/h)X, 



E,Km-T,\\/h) ' 

where K is the kernel function and h is the bandwidth that typically con- 
verges to zero as n goes to infinity. We use the notations Wij = K{\\Ti — 
Tj\\/h)/EkK{m-n\\/h) and wit,Ti) = K{\\t - m/h)/ Ki\\t - Tj\\/h) 
below for convenience. 

With the kernel estimators plugged into ([2]), we have formally the following 
functional linear model 

Yi = {b,Xi) + ei, (3) 

with Yi = Yi — WijYj and Xi = Xi — J2j WijXj. Obviously ([3]) is the sample 
version of (|2l). 
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Following [5|, [13]) we define the second moment operator S by 



S = E[{X - E{X\T)) (X - E{X\T))], 

with the interpretation of 5 to be a mapping from H to H: S{x) = E[{X — 
E{X\T),x){X — E{X\T))], Vx G H. We also define the cross second moment 
operator A by 

A = E[{X - E{X\T))(Y - E{Y\T))]. 

The sample version of S" is S = Xi (8i Xi and A can be defined similarly. 

Using the Karhunen-Loeve expansion, we can write 

oo 

and 

oo 

S = ^Aj(^j (^j, 
i=i 

with Ai > A2 > ••• the eigenvalues and </>i , (/>2 , . . . orthonormal eigenvectors 
associated with S. Similarly for Ai > A2 > • • • and 01 , 02 , . . . associated with the 
sample version operator S. 

From ([2]), we get S{b) = A. If we expand different quantities in terms of the 
orthnormal system {4>j}, we have the representations b = bjcpj, A = 
with relation bj = A^/A,-, which leads to the principal component analysis based 
estimator used in [1, [a, : 

m 

where bj = (A, and m < n is the truncation level that trades off approxi- 

mation error against variability, and m typically diverges with n. 
Finally, the nonparametric component g can be estimated as 

5(t) = 5^t«(t,T,)(y,-(6,x,)). 

j 

Next we study consistency and rate of convergence for the proposed esti- 
mators. Before doing that, we state a simple model identifiability result which 
only requires the positive definiteness of the operator S, which will be assumed 
throughout the paper. 
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Proposition 1 Assume that the operator S is positive definite (i.e. Xj > Vjj; 
then model ([iP is identifiable. Specifically, E(Y\X,T) = {bi,X) + gi(T) = 
{b2,X) + g2{T) implies that bi = 62 o.iT'd gi = 52 on the support of the distri- 
bution of T . 

The assumptions required for our consistency result are stated as follows. 

(A) / E{X'^) < 00, Ee^ < 00 and the distribution of T is supported on a 
compact subset of L^([0, 1]). 

(B) Ai > A2 > A3 > • • • > 0, i.e. the eigenvalues are positive with multiplicity 
1. 

(C) The kernel K satisfies the usual condition: K has support [0, 1], continuous 
on [0,00) and —K'{u),u £ (0, 1) is positive and bounded away from zero. 

(D) The function g{t) in model ([1]) and h(t) = E{X\T = t) are Lipschitz 
continuous of order 7: \g(ti) — g{t2)\ < C\\ti — t2\P, \\h{ti) — h{t2)\\ < 

C\\ti-t2\\^. 

(E) The bandwidth h satisfies h ^ and n(j){h) 00, where (p{h) is the 
asymptotic order of the so-called small ball probability, that is , CQ(j){h) < 
P(||r — t\\ < h) < ci(j){h) for some Co,ci > and for all t in the support 
of the distribution of T. 

(F) m 00, A;~^A^ 00, where kn = h'^ + 1 / {n(f){h)) . 

(G) K^Xm/iT.Y=i 00 where 6j = mini<fc<j(Afc - A^+i). 

The consistency proof for the theorem below makes use of existing results 
for the functional linear model [Sj] but the assumptions we need are stronger due 
to the presence of the nonparametric component. 

Theorem 1 Suppose that assumptions (A)-(G) are satisfied, then ||6 — 6|| + 
\g{t) — g{t)\ in probability. 

To calculate the rates of convergence, we make the following additional assump- 
tions on the various Fourier coefficients defined previously: 

(H) Xj - Xj+i > C-^j-''-\ \bj\ < Cj-f^ for some C > 0, a > 1, /5 > 1. 

Theorem 2 Under assumptions (A)-(H), we have the convergence rates (for 
convergence in probability) ||6 — 6|p = Op(/c^m^"+^ + m~^'^+^), \g{t) — = 
Op(A;2m4°+3 + m-2/3+i). 
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Remark 1 With only the parametric component, showed that the optimal 
rate for \\b — 5|p is Oj,(n~(^^~-'^)/("+^^)) if P > 1 + a/2. With only the nonpara- 
metric component, [11] obtained the rates Op{k'^) (if their results are adapted to 
the case of convergence in probability instead of almost surely). Our asymptotic 
results above show that in a functional partial linear model we can only obtain a 
substantially slower rate. Further discussions on this point are made in Section 



3 Simulation 

In this section, we provide a numerical example to illustrate the methodology 
and theory presented previously. We simulate samples {Xi,Ti,Yi) from model 
([T|). For the linear component, we take b = Ylj^j^'j with bi = 0.5, bj = 4j^^ 
for j > 2, = 1, = \/2cos((j — l)vrt) for j > 2 and X = Ylj^j'^j4>j with 
independent and uniformly distributed on [— -v/3, \/3] and aj = For the 
nonparametric component, we use 

g{t) = [ - cos(7rs))ds 

Jo 

and the random covariate curves for the nonparametric component are simulated 
marginally from 

T(s) = sin(c(;s) + (a — tt)s + d,ijj ^ U m/(0, 27r), a, d ~ Unif{0, 1). 

To introduce some dependence between X and T, we set a = ^i/2\/^ + 1/2, 
d = ^2/2^/3 + 1/2. Finally, Gaussian errors with standard deviations of 0.5 are 
added to produce the final dependent variables. 

To assess the performance of the procedure, we consider the following error 
criteria: 

MSEi = 

n 

MSE2 = Y.^m)-9{Ti)?ln, 

i=l 
n 

MSE3 = Y.{{b-b,Xi)+g{Ti)-g{Ti)f/n, 

i=l 

which represent the errors for the functional linear coefficients, the nonlinear 
component of the regression function and the regression function respectively. 
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In the implementation, for the parametric linear component, we use B- 
spline of order 4 with 20 equi-spaced knots to represent the functional covari- 
ates with no additional smoothing (since no error is contained in the covariates). 
Functional principal component analysis is performed using the R package fda 
(http://ego.psych.mcgill.ca/misc/fda/software.html). For the nonpara- 
metric component, we use the quadratic kernel for the nonparametric estimator, 

with estimation performed using the npfda package (http: //www. math, univ-toulouse . f r/staph/npf da/inde: 

We present the simulation results for n = 100 and n = 500 in Table [1] 
and [2] respectively, with different truncation levels m and different bandwidth 
parameters h. In the tables, the bandwidth h is the median of pairwise distances 
among the functional covariates, i.e., h = medi<^j{\\Ti — Tj\\}. For a given sample 
size n, our results represent averages over 100 Monte Carlo replications for each 
parameter setting. The three numbers for each parameter setting correspond to 
the three error measures above. We note that for different error measures, the 
minimum errors are achieved at different parameter settings. We also show in 
Figure [1] the estimated linear coefficient b using the optimal parameter settings 
(minimizing MSEi) for both sample sizes. 

We then compare the performance of completely parametric and completely 
nonparametric estimators with the same data generated from the true model ([1]) . 
That is, we concatenate Xi and Tj and consider the new covariate as defined on 
the interval [0, 2] and then apply the two approaches for estimating the regression 
function. For these two estimators, only the mean squared error for the regression 
function [MSE^) above makes sense, which is presented in Table [3] and S] for 
the two estimators respectively. When the true model is partially linear, the 
completely linear model is clearly misspecified and results in extremely large 
mean squared errors. The completely nonparametric estimator is also not as 
good as the partial linear estimator since it loses some efficiency when X is in 
fact linearly related to the responses. 

4 Conclusion 

In this paper we initiate a study on functional partial linear models where both 
components are functional in nature. Consistency and convergence rates are 
obtained. Unlike the traditional partial linear model where the convergence rates 
for either component are the same under mild regularity conditions whether the 
other component is known or not, here for our functional model the rates obtained 
are worse than that of completely parametric or nonparametric models. From 
the proofs, this decrease in rate is caused by the convergence rate of ||5 — S|| 
which in the completely parametric case is Op{l/^/n) [1, E3]) while the unknown 
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Table 1: Simulation results (MSE) for our functional partial linear regres- 
sion model when n — 100. The minimum errors are emphasized with fold- 
face font. 
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(a) (b) 

Figure 1: Estimated functional linear coefficient (dotted line) with different 
sample sizes, (a)n = 100; (b)n = 500. 
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Table 2: Simulation results (MSE) for our functional partial linear regres- 
sion model when n = 500. 
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Table 3: Simulation results (MSE) using data generated from the partial 
linear model but fitted using functional linear regression when n = 100. 
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Table 4: Simulation results (MSE) using data generated from the partial 
linear model but fitted using completely nonparametric regression when 
n = 100. 
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nonparametric component in our model makes the rate slower (Lemma [T] in the 
Appendix). Although we do not have any corresponding lower bounds on the 
rates of convergence, it is reasonable to conjecture that the optimal rate cannot 
be achieved when the parametric component is infinite dimensional as in our 
functional model. 

In our estimation procedure, we need to choose both the number of principal 
components for the parametric part and the bandwidth for the nonparametric 
part. Although we do not consider automatic selection for these parameters in the 
current study, we could use standard techniques such as K-fold cross-validation. 
With two parameters to search over, it is still to be seen whether we can get 
reasonable performances with limited computational resources. Another open 
question is the construction of confidence bands for either the parametric or the 
nonparametric component. From a conceptual point of view, bootstrap method 
seems to be viable but its computational and theoretical properties remain as a 
challenge. All those problems deserve further investigations. 

Appendix 

Proof of PropositionUl If E{Y\X,T) = {bi,X) + gi{T) = (62, AT) + g2{T), since 
E{Y - {bi,X) -gi{T)f = E{Y-{b2,X)-g2{T)f + {S{hi-b2)M-b2), we 
have hi = 62 by the positive definiteness of S. Then gi = g2 follows from 
gj{T) = E[Y-{bj,X)\T],j = 1,2. 

For any operator U : Hi — > H2 which is a linear mapping between two Hilbert 
spaces, we consider the operator norm \ \U\\ = sup||3,||^^<;^ | |C/(x)| • Note that 
there is no confusion when we use || • || for both the operator norm and the 
norm when H2 is the real line because of the Riesz representation theorem. The 
following lemma gives the convergence rates for operators S and A. 

Lemma 1 Under the assumptions (A),(C)-(E) stated in Section\^ we have 

\\S-S\\=Op{K) 

and 

||A- All = Op{kn), 

where kn = + {n(j){h))~'^/'^ . 
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Proof. By the definition of the operator S, we have 



1 

s = - ^(Xi - wijXj) (^{Xi-Y, wijXj) 

«=1 3 3 

1 

= - ^(X, - E{X\Ti)) {Xi - E{X\Ti)) 

i=l 
1 " 

+- ^(S(X|Ti) -YwijXj) ® {X, - E{X\Ti)) 

i=l j 
1 " 

+- Yi^i - E{Xm)) {E{X\Ti) - YwijX,) 
1=1 ] 

1 " 

+-Y,{E{X\Ti) -Y^aXj) {E{X\Ti) -Yw.jX,) 

*=1 3 3 

= : S*! + 5*2 + ^3 + 54. 

Lemma 5.2 in [5] showed that \\Si — 5|| = Op{n~^/'^) = Op{kn)- It can be shown 
that maxjl |£|{X|Tj) — WijXj\ \ = Op{kn)- The proof of this fact is similar to 
that of 10|, [ill but is in fact simpler due to the fact that we only need to use 
Markov inequality to show convergence in probability instead of using Bernstein's 
inequality in showing almost sure convergence. The extra logn factor does not 
appear for the same reason when we are only interested in showing convergence 
in probability. Thus all three terms 82,83 and ^4 are of order Op{kn) and the 
rate for US' — /SH is shown. The proof for ||A — A|| is similar and thus omitted. 

Proof of Theorem Let 6(™) = YJf=ibj(t>j, then - 6|| ^ as m ^ 00. 

During the proof for consistency of functional linear models, jH] showed that 

1 2 

||6_5M|| <C( +£^2^L^)||A||.||5-S|| + — ||A-A|| (4) 

on the event {\\m — \m\ < Am/2}. 
For any e > 0, we have 

P{\\S-8\\>{-^+ ' )-'e) 

= P{k-'\\S-8\\>k;,\-^+ ^^=' ^ )-'e)^0 (5) 
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using Lemma [U since k^'^{2/\^ + Q'^Y=i^j ^/^m) ""^ ^ c>o by assumptions (F) 
and (G). Similarly we have 

P(||A-A|| > A^e)^0. (6) 

Finally, 

- > Xm/2}) < P{\\S - S\\ > ^ (7) 

by Lemma [Hand assumption (F). Equations (j4j)-([7]) together imply the consis- 
tency result for b. 

For \g{t) — g{t)\, one only need to note that 

\m-9it)\ < \g*{t) - 9{t)\ + \\^w{t,Ti)Xi\\ -Wb-bW, (8) 

i 

where g*{t) = ^iW{t,Ti){g{Ti) + Ci). The by now standard results in [l^, 11| 
ten us \g*{t)-g{t)\ = Op{kn). 

Proof of Theorem [H In the proof, C denotes a generic constant that can assume 
different values at different places it appears. First we note that directly using 
equation ([5]) results in slower rate ||6 — 6|p = Op(A;^m^"''~^ + m"^^"*"^). Instead, 
we use the decomposition bound 

m 

m m 

< c^ib,-b,f + \\^bj4>,-b\\' 

\j=i Aj Xj Xj Aj 



„ m oo 



oo 

b^ 

j=l j-m+1 



=: A1 + A2 + AS + Ai. 

On the event {|Aj — Aj| < Xj/2,j < m} which happens with probability converg- 
ing to 1, and using the fact | Aj — Aj| < US' — S"!! and 1 1 1?!* j — i?!* j 1 1 < 2\/2| |5 — 5*1 
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[1) 12], where 5j is defined in assumption (G), together with Lemma[Tl we have 

m 



(Aj - Aj)^ \2 _ (^Li^j)^ 



^2 < c5:^^^^^(A,0,)^=c5: 



21,2 



\4 ^-^i 

i=i J 



m ,2 



m 

^3 < Cm^62||^_^^.||2<^^^^^2 

j=l j 



m ^2 



2 2a+2\ 



oo 

A, < C f2 J-'^ = 0,{m-'^+'). 

j=m+l 

The conclusion — 6|p = Op{k'^m^"^~^^ + m"'^^^^) now directly follows from the 
above bounds for Ai, i = 1,2, 3, 4. 

Finally, the convergence rate for the nonparametric component follows di- 
rectly from ([8]). 
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